Skip to content

Conversation

@alexcos20
Copy link
Member

@alexcos20 alexcos20 commented Jan 15, 2026

Changes proposed in this PR:

  • more error handling in compute loop
  • use setTimeout/clearTimeout instead of setInterval/clearInterval

see #1122 for context

@alexcos20 alexcos20 marked this pull request as ready for review January 15, 2026 09:16
@alexcos20
Copy link
Member Author

/run-security-scan

Copy link
Member Author

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: low

Summary:
This pull request significantly enhances the robustness and reliability of the C2D compute engine's internal loop. The core change involves transitioning from setInterval to a setTimeout-based recurring loop, which is a safer pattern for asynchronous tasks as it prevents overlapping executions. Additionally, the InternalLoop function now includes a try...catch...finally block, ensuring that the next iteration of the loop is always scheduled, even if errors occur during job retrieval or processing. This prevents the compute engine from silently failing or getting stuck. Error logging for the internal loop has also been improved.

Comments:
• [INFO][style] Changing setNewTimer from async to synchronous is appropriate here since it no longer awaits getComputeEnvironments() and directly calls setTimeout, which doesn't return a promise that needs awaiting.
• [INFO][performance] The change from setInterval to setTimeout is a crucial improvement for robust asynchronous loops. This pattern ensures that a new InternalLoop execution is only scheduled after the previous one has completed (or failed), preventing overlapping executions and resource contention, especially when job processing times can vary.
• [INFO][other] The check if (this.cronTimer) { return } ensures idempotency if setNewTimer is called externally while a timer is already active. Within InternalLoop, this.cronTimer is explicitly cleared before setNewTimer is called, so it will always schedule a new one there, which is the desired behavior.
• [WARNING][bug] The condition changed from (await this.getComputeEnvironments()).length > 0 to this.envs.length > 0. Please confirm that this.envs is always guaranteed to be accurately populated and up-to-date before setNewTimer is called. If getComputeEnvironments() performs any critical initialization or fetches the latest state, bypassing it could lead to issues where the engine operates with stale or incomplete environment data. If this.envs is just a cached value, this change is a performance optimization and is fine.
• [INFO][bug] The addition of the try...catch...finally block significantly improves the robustness of the InternalLoop. Now, even if an error occurs during fetching jobs or processing them, the engine will log the error and crucially, always reschedule the next loop execution in the finally block. This prevents the C2D engine from silently stopping due to an unhandled error.
• [INFO][style] Correctly adapting the timer clearing logic from clearInterval to clearTimeout is good. The if (this.cronTimer) check also handles cases where cronTimer might already be null without errors.

@alexcos20 alexcos20 mentioned this pull request Jan 15, 2026
@alexcos20 alexcos20 merged commit 9cd4598 into main Jan 15, 2026
19 of 20 checks passed
@alexcos20 alexcos20 deleted the bug/more_error_handling_compute_loop branch January 15, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants